Efficient computation of the joint sample frequency spectra for multiple populations.

نویسندگان

  • John A Kamm
  • Jonathan Terhorst
  • Yun S Song
چکیده

A wide range of studies in population genetics have employed the sample frequency spectrum (SFS), a summary statistic which describes the distribution of mutant alleles at a polymorphic site in a sample of DNA sequences and provides a highly efficient dimensional reduction of large-scale population genomic variation data. Recently, there has been much interest in analyzing the joint SFS data from multiple populations to infer parameters of complex demographic histories, including variable population sizes, population split times, migration rates, admixture proportions, and so on. SFS-based inference methods require accurate computation of the expected SFS under a given demographic model. Although much methodological progress has been made, existing methods suffer from numerical instability and high computational complexity when multiple populations are involved and the sample size is large. In this paper, we present new analytic formulas and algorithms that enable accurate, efficient computation of the expected joint SFS for thousands of individuals sampled from hundreds of populations related by a complex demographic model with arbitrary population size histories (including piecewise-exponential growth). Our results are implemented in a new software package called momi (MOran Models for Inference). Through an empirical study we demonstrate our improvements to numerical stability and computational complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computation of the Likelihood of Joint Site Frequency Spectra Using Orthogonal Polynomials

In population genetics, information about evolutionary forces, e.g., mutation, selection and genetic drift, is often inferred from DNA sequence information. Generally, DNA consists of two long strands of nucleotides or sites that pair via the complementary bases cytosine and guanine (C and G), on the one hand, and adenine and thymine (A and T), on the other. With whole genome sequencing, most g...

متن کامل

Computation of Earthquake Response via Fourier Amplitude Spectra

A theoretical relation is presented between the seismological Fourier amplitude spectrumand the mean squared value of the elastic response, which is defined by Gaussian distribution. Byshifting a general process to its mean value, spectrum of the mean squared value of the displacementis computed from the Fourier amplitude spectrum and the real part of the relative displacementtransfer function ...

متن کامل

Bayesian Sample Size Determination for Joint Modeling of Longitudinal Measurements and Survival Data

A longitudinal study refers to collection of a response variable and possibly some explanatory variables at multiple follow-up times. In many clinical studies with longitudinal measurements, the response variable, for each patient is collected as long as an event of interest, which considered as clinical end point, occurs. Joint modeling of continuous longitudinal measurements and survival time...

متن کامل

برآورد هیدروگراف واحد مصنوعی با استفاده از تحلیل منطقه‌ای سیلاب و پارامترهای ژئومورفولوژیکی (مطالعه موردی: حوضه‌های آبخیز مارنج و کانی‌سواران، کردستان)

Estimation of flood hydrograph is of necessities in hydrological studies such as flood mitigation projects. This estimation in un-gauged watersheds is usually taken place using geomorphological characteristics of watersheds. The objective of this research is to estimate synthetic unit hydrograph using regional flood frequency analysis and geomorphological parameters of watersheds. 1-hour and 2-...

متن کامل

Analysis of Planar Microstrip Circuits Using Three-Dimensional Transmission Line Matrix Method

The frequency-dependent characteristics of microstrip planar circuits have been previously analyzed using several full-wave approaches. All those methods directly give characteristic of the circuits frequency by frequency. Computation time becomes important if these planar circuits have to be studied over a very large bandwidth. The transmission line matrix (TLM) method presented in this paper ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

دوره 26 1  شماره 

صفحات  -

تاریخ انتشار 2017